# Public transit ridership refers to the number of people who use public transportation services, such as buses, trains, subways, trams, and other forms of mass transit, to travel from one place to another. It is an important metric to assess the usage and popularity of public transportation systems in a given area. Understanding public transit ridership is crucial for transportation authorities, city planners, and policymakers because it can help inform decisions about funding, service improvements, and the overall effectiveness of a public transit system.
library(readr)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2 3.5.1 ✔ fma 2.5
## ✔ forecast 8.24.0 ✔ expsmooth 2.3
## Warning: package 'forecast' was built under R version 4.4.1
##
library(TTR)
TRANSIT <- read_csv("TRANSIT.csv")
## Rows: 283 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): TRANSIT
## date (1): DATE
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Transit_Raw <- TRANSIT$TRANSIT
Transit_ts <- ts(Transit_Raw, frequency = 12, start = c(2000,1))
plot(Transit_ts)

# Plot and Inference
# Time Series Plot
plot(Transit_ts,
main = paste("Time Series Plot for Usage of Public Transportation Services"),
xlab = "Year",
ylab = "Public Transportation Service Usage",
col = "black",
lwd = 2)

# Observation: The number of people using public transportation stayed steady oscillating around 800,000, however in 2020 the number of people using public transport suddenly declined. This is due to the onset of COVID-19 and many people either losing their jobs or switching to work from home, which led to the decline.
# Central Tendency
summary(Transit_ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 171450 772502 824766 779435 866164 993437
boxplot(Transit_ts,
main = "Boxplot for the Usage of Public Transportation Services",
ylab = "Value")

# Observation: Public transportation usage was stable and seasonally yet mildly fluctuating until 2020. There was a dramatic decline starting around 2020 due to COVID-19 pandemic and reduced mobility, which led to the huge number of lower outliers observed in the boxplot during the 2020-2021. This distribution would be negatively skewed (left-skewed) because of the sharp dip values. Post-COVID recovery after 2021 has been gradually increasing but overall it is still staying before the pre-2020 levels, this can be due to the permanent change in the work culture and most people working on hybrid schedules. The median for the boxplot is ~824,766 users indicating the central typical level of public transportation usage and the IQR between about 772,502 (Q1) and 866,164 (Q3), showing moderate variability around the median.
# Decomposition
stl_decomp <- stl(Transit_ts,s.window ="periodic")
plot(stl_decomp)

# The time series is seasonal because it does have seasonal fluctuations displayed by the regular, repeating oscillations. The public transportation usage follows a consistent cyclical pattern over time with similar rises and falls occurring at regular intervals of the seasonal component. The amplitude of the seasonal fluctuations is also relatively stable, even though the overall trend drops sharply around 2020.
decomp_transit <- decompose(Transit_ts)
decomp_transit
## $x
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2000 724934 756536 843730 757458 818246 793911 743746 787064 802516 851255
## 2001 800293 761213 846702 803744 850680 807301 785014 809941 768513 871470
## 2002 794010 756102 820151 824757 838274 772040 782849 781609 805016 877109
## 2003 778679 732135 820157 810703 799984 754298 767411 745756 807125 830771
## 2004 740829 750265 843276 803337 781133 784558 754430 772963 814446 845387
## 2005 756467 750973 842424 816425 806149 795101 753939 808031 858809 858662
## 2006 793138 761980 874469 806944 870445 827752 790273 840294 850328 907084
## 2007 838203 796569 934752 884098 993437 876764 860196 908061 903028 946754
## 2008 852130 835901 889940 908295 912568 884081 910651 890570 929094 971677
## 2009 827360 806287 897860 874039 856958 848878 843229 825429 875511 917949
## 2010 788830 746114 893899 869986 853057 842674 819152 844979 864777 888591
## 2011 793606 780318 907771 856803 880563 861412 824766 843294 891667 919419
## 2012 843287 859851 913814 869911 900348 851686 848905 893636 877531 924987
## 2013 864280 813159 888516 907515 908342 851490 868345 888314 900754 972814
## 2014 832033 812275 913232 920884 923901 874651 886186 887484 931357 986733
## 2015 818017 779983 907646 901828 871727 878666 890344 840937 897493 953141
## 2016 796025 832503 907384 871083 879877 867550 809114 864510 884140 889002
## 2017 802693 777821 886230 834677 875749 842737 790814 837928 844867 910849
## 2018 788672 766031 841125 829115 857964 822283 797080 833069 816289 921343
## 2019 773669 740735 832480 853038 861455 800261 822014 835988 846274 915445
## 2020 815015 788599 505660 171450 200586 256196 306701 318609 331020 351338
## 2021 292833 275803 348362 352403 374081 409969 423379 428647 463998 500798
## 2022 392741 427514 512795 507034 517434 522856 495785 533257 563825 583121
## 2023 525669 512739 593360 560127 609180 572834 532206
## Nov Dec
## 2000 816120 755040
## 2001 820117 756221
## 2002 788746 754257
## 2003 724971 758869
## 2004 797405 767658
## 2005 825130 760996
## 2006 851564 804029
## 2007 870712 808452
## 2008 839026 835899
## 2009 830535 815176
## 2010 837848 812769
## 2011 862730 839221
## 2012 863060 824694
## 2013 862788 837701
## 2014 841871 864282
## 2015 848427 850726
## 2016 839892 807517
## 2017 831770 773762
## 2018 812423 763255
## 2019 814066 791316
## 2020 311158 309585
## 2021 469267 450587
## 2022 549008 520241
## 2023
##
## $seasonal
## Jan Feb Mar Apr May Jun
## 2000 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2001 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2002 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2003 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2004 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2005 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2006 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2007 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2008 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2009 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2010 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2011 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2012 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2013 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2014 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2015 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2016 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2017 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2018 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2019 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2020 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2021 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2022 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## 2023 -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## Jul Aug Sep Oct Nov Dec
## 2000 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2001 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2002 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2003 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2004 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2005 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2006 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2007 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2008 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2009 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2010 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2011 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2012 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2013 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2014 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2015 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2016 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2017 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2018 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2019 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2020 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2021 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2022 -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
## 2023 -20385.6561
##
## $trend
## Jan Feb Mar Apr May Jun Jul Aug
## 2000 NA NA NA NA NA NA 790686.3 794021.1
## 2001 803859.0 806531.7 806068.1 805493.6 806502.5 806718.2 806505.6 806030.9
## 2002 801294.0 800023.2 800363.7 802119.6 801047.5 799658.5 798937.9 797300.5
## 2003 789818.6 787681.5 786275.5 784432.6 779844.5 777379.4 775994.5 775172.8
## 2004 777650.9 778243.6 779682.3 780596.3 784223.4 787607.7 788625.5 789306.6
## 2005 793298.5 794739.2 798048.9 800450.5 802158.8 803036.4 804286.8 806273.4
## 2006 818205.2 821063.4 822054.3 823718.5 826837.5 829732.0 833402.7 836721.6
## 2007 866863.0 872600.1 877619.6 881468.3 883919.1 884901.2 885665.8 887884.9
## 2008 883778.8 885152.3 885509.6 887634.1 887352.3 887175.7 887287.2 885021.2
## 2009 871215.7 865692.2 860745.4 856274.1 853681.6 852464.4 849995.5 845882.9
## 2010 840862.5 840673.9 841041.2 839370.8 838452.2 838656.6 838755.3 840379.5
## 2011 845949.7 846113.4 847163.6 849568.5 851889.8 854028.7 857200.9 862584.8
## 2012 869338.6 872442.0 873950.6 873593.6 873839.3 873247.8 873517.2 872446.4
## 2013 872986.2 873574.5 874320.4 877280.8 879262.2 879792.9 878991.2 877610.8
## 2014 884717.7 885426.5 886667.0 888522.1 888230.5 888466.5 888990.1 887060.6
## 2015 879821.6 878055.4 874704.9 871894.2 870767.8 870476.1 868994.9 870266.9
## 2016 866239.6 863837.2 864263.0 861034.2 858006.1 855850.1 854327.6 852327.0
## 2017 842077.7 840207.6 837463.6 836737.5 837309.4 835564.5 833573.9 832498.4
## 2018 824859.4 824918.0 823524.8 822771.3 822402.5 821158.5 820095.6 818416.5
## 2019 818130.3 819290.9 820661.9 821665.5 821488.2 822725.9 825617.8 829334.9
## 2020 625412.7 582383.9 539357.5 494384.1 449925.2 408898.5 367068.8 323944.8
## 2021 336683.4 346129.9 356255.6 368023.8 380839.2 393302.2 403340.1 413824.2
## 2022 471104.4 478480.1 486998.3 494587.9 501340.5 507565.3 516006.2 525096.0
## 2023 553113.0 NA NA NA NA NA NA
## Sep Oct Nov Dec
## 2000 794339.8 796392.2 799672.2 801581.6
## 2001 804711.6 804480.9 804839.5 802853.4
## 2002 796302.1 795716.7 793535.7 791201.1
## 2003 776891.5 777547.9 776455.5 776930.9
## 2004 789300.6 789810.4 791398.1 792879.7
## 2005 808067.2 809007.4 811291.3 815330.8
## 2006 840674.6 846401.2 854740.6 861907.4
## 2007 887656.6 886797.6 884436.3 881371.6
## 2008 884117.3 883020.0 879275.6 875491.7
## 2009 843210.6 842876.7 842545.3 842124.2
## 2010 842382.7 842411.4 843008.2 844935.0
## 2011 866150.5 866948.4 868319.0 868738.1
## 2012 869446.8 869959.6 871859.5 872184.4
## 2013 878603.7 880190.6 881396.0 883009.3
## 2014 885482.3 884455.6 881487.7 879481.0
## 2015 872444.3 871152.4 870210.9 870087.3
## 2016 849167.2 846768.8 845079.9 843874.0
## 2017 830127.8 828016.7 827043.9 825450.6
## 2018 817002.3 817638.9 818781.1 818009.0
## 2019 817711.7 775694.7 719759.0 669553.5
## 2020 296024.2 297009.8 311778.5 325414.6
## 2021 426996.9 440291.2 452707.2 463383.9
## 2022 532003.9 537573.0 543607.9 549513.1
## 2023
##
## $random
## Jan Feb Mar Apr May
## 2000 NA NA NA NA NA
## 2001 23569.96222 438.48833 9253.12848 26.82355 29638.31219
## 2002 19852.00389 1835.94666 -11593.45485 24413.82355 22687.31219
## 2003 15996.37889 -9789.26167 2500.79515 28046.86522 5600.22886
## 2004 -9685.91278 17778.57166 32212.96181 24517.11522 -17629.64614
## 2005 -9695.57944 1990.94666 12994.37848 17750.99022 -10549.02114
## 2006 2068.79556 -13326.17834 21033.96181 -14998.05145 29068.27052
## 2007 -1524.07944 -30273.92834 25751.67015 4406.11522 94978.68719
## 2008 -4512.82944 -3494.09501 -26950.32985 22437.32355 10676.43719
## 2009 -16719.70444 -13648.01167 5733.87848 19541.36522 -11262.85448
## 2010 -24896.57944 -48802.72001 21477.00348 32391.69855 65.56219
## 2011 -25207.70444 -20038.17834 29226.67015 9010.94855 14134.02052
## 2012 1084.33722 33166.19666 8482.67015 -1906.13478 11969.43719
## 2013 18429.71222 -14658.30334 -17185.12152 32010.65689 14540.52052
## 2014 -25548.74611 -27394.30334 -4815.78819 34138.32355 21131.22886
## 2015 -34668.62111 -52315.17834 1560.33681 31710.19855 -13579.97948
## 2016 -43078.62111 14422.98833 11740.21181 11825.24022 7331.64552
## 2017 -12248.70444 -16629.38667 17385.62848 -284.09311 23900.35386
## 2018 -9051.45444 -13129.84501 -13780.57985 8120.11522 21022.31219
## 2019 -17325.37111 -32798.67834 -19562.62152 33148.94855 25427.56219
## 2020 216738.25389 251972.32166 -65078.24652 -321157.67645 -263878.39614
## 2021 -16714.45444 -24569.72001 -39274.32985 -13844.38478 -21297.43781
## 2022 -51227.45444 -5208.88667 -5584.03819 14222.57355 1554.22886
## 2023 -308.07944 NA NA NA NA
## Jun Jul Aug Sep Oct
## 2000 NA -26554.63560 -6767.46531 -14616.70807 -15146.43814
## 2001 11525.15499 -1105.96894 4099.78469 -58991.49973 -3020.06314
## 2002 -16676.13667 4296.78106 -15501.79865 -14078.95807 11383.06186
## 2003 -12139.05334 11802.15606 -29227.17365 7440.58360 -16786.10480
## 2004 7892.65499 -13809.84394 -16153.92365 2352.54193 -14432.60480
## 2005 3006.94666 -29962.13560 1947.28469 27948.91693 -20354.56314
## 2006 8962.40499 -22744.05227 3762.03469 -13139.49973 -9326.35480
## 2007 2805.15499 -5084.13560 20365.74302 -7421.45807 -10052.81314
## 2008 7847.65499 43749.40606 5738.40969 22183.79193 18647.81186
## 2009 7355.98833 13619.15606 -20264.21531 9507.50027 5063.10353
## 2010 14959.73833 782.32273 4789.15969 -398.54140 -23829.56314
## 2011 18325.69666 -12049.21894 -19101.13198 2723.66693 -17538.60480
## 2012 -10619.42834 -4226.55227 21379.24302 -14708.70807 -14981.77147
## 2013 -17360.51167 9739.44773 10892.90969 -642.62473 22614.18686
## 2014 -2873.17834 17581.57273 613.07635 23081.79193 32268.22853
## 2015 19132.27999 41734.73940 -29140.25698 2255.79193 11979.43686
## 2016 22642.23833 -24827.92727 12372.65969 12179.95860 -27776.02147
## 2017 18114.82166 -22374.21894 5619.24302 -8053.66640 12823.14520
## 2018 12066.82166 -2629.96894 14842.15969 -23506.16640 33694.93686
## 2019 -11522.51167 16781.82273 6842.74302 5769.37527 69741.06186
## 2020 -141760.17834 -39982.17727 -5146.09031 12202.95860 -15680.97980
## 2021 27609.19666 40424.57273 15012.45135 14208.25027 -9502.39647
## 2022 26233.02999 164.40606 8350.70135 9028.25027 -24461.14647
## 2023 NA NA
## Nov Dec
## 2000 19052.98759 -16612.06857
## 2001 17882.73759 -16702.86024
## 2002 -2184.51241 -7014.56857
## 2003 -48879.30408 11867.59809
## 2004 8612.15425 4707.80643
## 2005 16443.90425 -24405.27691
## 2006 -571.34575 -27948.90191
## 2007 -11119.05408 -42990.11024
## 2008 -37644.34575 -9663.19357
## 2009 -9405.05408 2981.26476
## 2010 -2554.92908 -2236.48524
## 2011 -2983.72075 412.43143
## 2012 -6194.26241 -17560.90191
## 2013 -16002.72075 -15378.77691
## 2014 -37011.42908 14730.47309
## 2015 -19178.67908 10568.18143
## 2016 -2582.67908 -6427.52691
## 2017 7331.36259 -21759.06857
## 2018 -3752.88741 -24824.48524
## 2019 96912.19592 151692.05643
## 2020 1984.77925 14099.88976
## 2021 19165.02925 17132.63976
## 2022 8005.32092 657.43143
## 2023
##
## $figure
## [1] -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## [7] -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
##
## $type
## [1] "additive"
##
## attr(,"class")
## [1] "decomposed.ts"
# The decomposition is additive.
monthly_indices <- decomp_transit$seasonal[1:12]
monthly_indices
## [1] -27135.9622 -45757.1967 31380.7465 -1776.4486 14539.2295 -10942.3633
## [7] -20385.6561 -189.6597 22792.8747 70009.1881 -2605.2376 -29929.5148
# February has the lowest monthly index (-45757.1967) which could possibly be the winter effect and October has the highest monthly index (70009.1881) which could be due to back-to school/work pattern changes.
# February was lower due to mid-winter effects discourage travel, it is also a shorter month and therefore fewer commuting days. In the recent years, there is more flexibility for remote work arrangements in the colder months reducing the transit usage. Additionally, in this month people are less likely to go out for social or recreational activities.
# October has higher because both work and school are in high peak leading to higher usage of transit. It is also a full work/school month having most number of commuting days compared to the holiday winter months. It also has milder weather, making it favorable and more attractive to use public transportation for fall events, sports, and social activities which would increase mobility and higher usage of the public transportation services.
seasadj <- seasadj(stl_decomp)
seasadj
## Jan Feb Mar Apr May Jun Jul Aug
## 2000 753278.5 801529.0 811357.6 760753.0 802351.7 804104.2 765494.7 787285.0
## 2001 828637.5 806206.0 814329.6 807039.0 834785.7 817494.2 806762.7 810162.0
## 2002 822354.5 801095.0 787778.6 828052.0 822379.7 782233.2 804597.7 781830.0
## 2003 807023.5 777128.0 787784.6 813998.0 784089.7 764491.2 789159.7 745977.0
## 2004 769173.5 795258.0 810903.6 806632.0 765238.7 794751.2 776178.7 773184.0
## 2005 784811.5 795966.0 810051.6 819720.0 790254.7 805294.2 775687.7 808252.0
## 2006 821482.5 806973.0 842096.6 810239.0 854550.7 837945.2 812021.7 840515.0
## 2007 866547.5 841562.0 902379.6 887393.0 977542.7 886957.2 881944.7 908282.0
## 2008 880474.5 880894.0 857567.6 911590.0 896673.7 894274.2 932399.7 890791.0
## 2009 855704.5 851280.0 865487.6 877334.0 841063.7 859071.2 864977.7 825650.0
## 2010 817174.5 791107.0 861526.6 873281.0 837162.7 852867.2 840900.7 845200.0
## 2011 821950.5 825311.0 875398.6 860098.0 864668.7 871605.2 846514.7 843515.0
## 2012 871631.5 904844.0 881441.6 873206.0 884453.7 861879.2 870653.7 893857.0
## 2013 892624.5 858152.0 856143.6 910810.0 892447.7 861683.2 890093.7 888535.0
## 2014 860377.5 857268.0 880859.6 924179.0 908006.7 884844.2 907934.7 887705.0
## 2015 846361.5 824976.0 875273.6 905123.0 855832.7 888859.2 912092.7 841158.0
## 2016 824369.5 877496.0 875011.6 874378.0 863982.7 877743.2 830862.7 864731.0
## 2017 831037.5 822814.0 853857.6 837972.0 859854.7 852930.2 812562.7 838149.0
## 2018 817016.5 811024.0 808752.6 832410.0 842069.7 832476.2 818828.7 833290.0
## 2019 802013.5 785728.0 800107.6 856333.0 845560.7 810454.2 843762.7 836209.0
## 2020 843359.5 833592.0 473287.6 174745.0 184691.7 266389.2 328449.7 318830.0
## 2021 321177.5 320796.0 315989.6 355698.0 358186.7 420162.2 445127.7 428868.0
## 2022 421085.5 472507.0 480422.6 510329.0 501539.7 533049.2 517533.7 533478.0
## 2023 554013.5 557732.0 560987.6 563422.0 593285.7 583027.2 553954.7
## Sep Oct Nov Dec
## 2000 779732.2 781205.4 818560.6 784904.2
## 2001 745729.2 801420.4 822557.6 786085.2
## 2002 782232.2 807059.4 791186.6 784121.2
## 2003 784341.2 760721.4 727411.6 788733.2
## 2004 791662.2 775337.4 799845.6 797522.2
## 2005 836025.2 788612.4 827570.6 790860.2
## 2006 827544.2 837034.4 854004.6 833893.2
## 2007 880244.2 876704.4 873152.6 838316.2
## 2008 906310.2 901627.4 841466.6 865763.2
## 2009 852727.2 847899.4 832975.6 845040.2
## 2010 841993.2 818541.4 840288.6 842633.2
## 2011 868883.2 849369.4 865170.6 869085.2
## 2012 854747.2 854937.4 865500.6 854558.2
## 2013 877970.2 902764.4 865228.6 867565.2
## 2014 908573.2 916683.4 844311.6 894146.2
## 2015 874709.2 883091.4 850867.6 880590.2
## 2016 861356.2 818952.4 842332.6 837381.2
## 2017 822083.2 840799.4 834210.6 803626.2
## 2018 793505.2 851293.4 814863.6 793119.2
## 2019 823490.2 845395.4 816506.6 821180.2
## 2020 308236.2 281288.4 313598.6 339449.2
## 2021 441214.2 430748.4 471707.6 480451.2
## 2022 541041.2 513071.4 551448.6 550105.2
## 2023
plot(seasadj)
lines(Transit_ts, col="Red")

# Seasonality does not have big fluctuations in the values of the time series because it is stable so even without seasonality, the graph follows the original time series plot.
# Naive
naive_model <- naive(Transit_ts)
plot(naive_model)

checkresiduals(naive_model)

##
## Ljung-Box test
##
## data: Residuals from Naive method
## Q* = 281.61, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
naive_residuals <- residuals(naive_model)
naive_residuals
## Jan Feb Mar Apr May Jun Jul Aug Sep
## 2000 NA 31602 87194 -86272 60788 -24335 -50165 43318 15452
## 2001 45253 -39080 85489 -42958 46936 -43379 -22287 24927 -41428
## 2002 37789 -37908 64049 4606 13517 -66234 10809 -1240 23407
## 2003 24422 -46544 88022 -9454 -10719 -45686 13113 -21655 61369
## 2004 -18040 9436 93011 -39939 -22204 3425 -30128 18533 41483
## 2005 -11191 -5494 91451 -25999 -10276 -11048 -41162 54092 50778
## 2006 32142 -31158 112489 -67525 63501 -42693 -37479 50021 10034
## 2007 34174 -41634 138183 -50654 109339 -116673 -16568 47865 -5033
## 2008 43678 -16229 54039 18355 4273 -28487 26570 -20081 38524
## 2009 -8539 -21073 91573 -23821 -17081 -8080 -5649 -17800 50082
## 2010 -26346 -42716 147785 -23913 -16929 -10383 -23522 25827 19798
## 2011 -19163 -13288 127453 -50968 23760 -19151 -36646 18528 48373
## 2012 4066 16564 53963 -43903 30437 -48662 -2781 44731 -16105
## 2013 39586 -51121 75357 18999 827 -56852 16855 19969 12440
## 2014 -5668 -19758 100957 7652 3017 -49250 11535 1298 43873
## 2015 -46265 -38034 127663 -5818 -30101 6939 11678 -49407 56556
## 2016 -54701 36478 74881 -36301 8794 -12327 -58436 55396 19630
## 2017 -4824 -24872 108409 -51553 41072 -33012 -51923 47114 6939
## 2018 14910 -22641 75094 -12010 28849 -35681 -25203 35989 -16780
## 2019 10414 -32934 91745 20558 8417 -61194 21753 13974 10286
## 2020 23699 -26416 -282939 -334210 29136 55610 50505 11908 12411
## 2021 -16752 -17030 72559 4041 21678 35888 13410 5268 35351
## 2022 -57846 34773 85281 -5761 10400 5422 -27071 37472 30568
## 2023 5428 -12930 80621 -33233 49053 -36346 -40628
## Oct Nov Dec
## 2000 48739 -35135 -61080
## 2001 102957 -51353 -63896
## 2002 72093 -88363 -34489
## 2003 23646 -105800 33898
## 2004 30941 -47982 -29747
## 2005 -147 -33532 -64134
## 2006 56756 -55520 -47535
## 2007 43726 -76042 -62260
## 2008 42583 -132651 -3127
## 2009 42438 -87414 -15359
## 2010 23814 -50743 -25079
## 2011 27752 -56689 -23509
## 2012 47456 -61927 -38366
## 2013 72060 -110026 -25087
## 2014 55376 -144862 22411
## 2015 55648 -104714 2299
## 2016 4862 -49110 -32375
## 2017 65982 -79079 -58008
## 2018 105054 -108920 -49168
## 2019 69171 -101379 -22750
## 2020 20318 -40180 -1573
## 2021 36800 -31531 -18680
## 2022 19296 -34113 -28767
## 2023
plot(naive_residuals,
main = "Residuals from Naive Forecast for the Usage of Public Transportation Services",
ylab = "Residuals", xlab = "Time",
col = "blue", type = "l")
abline(h = 0, col = "red")

# The plot indicates that the model captured the main pattern of the public transportation usage, because the pattern oscillates over 800,000 transit users except for the sharp drop around 2020 which is a huge negative deviation from the norm.
hist(naive_residuals,
main = "Histogram of Residuals",
xlab = "Residuals",
col = "lightblue")

# The histogram roughly forms a bell-shaped curve centered around 0, however due to the sudden drop around 2020 and the gradual recovery is still below the pre-2020 levels explaining the lower outliers in the boxplot earlier, resulting in the negatively skewed AKA left skewed distribution of the histogram.
plot(fitted(naive_model), residuals(naive_model),
main = "Residuals vs Fitted Values",
xlab = "Fitted Values",
ylab = "Residuals",
col = "steelblue", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# The cluster at the higher values suggests that the data points around and above 800,000 is where the model consistently predicts well and captures a stable pattern which indicates smaller variation. The random scatter at lower fitted values could indicate greater variation in those years due to the external sudden shock like the COVID-19 pandemic in 2020 and the post-pandemic recovery years. At the lower values, we observe heteroscedasticity or non-constant variance because the scatterplot is wider and more random at those values.
plot(Transit_ts, residuals(naive_model),
main = "Residuals vs Actual Values",
xlab = "Actual Values",
ylab = "Residuals",
col = "navy", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# This plot is similar to the fitted vs. residuals plot meaning that the model fits the data well. It indicates higher variation at lower values and smaller variation at higher values during the stable years. The clustering around the higher values indicates there is a model misfit in this case due to heteroscedasticity.
Acf(naive_residuals)

# The bars outside of the blue dashed line which is the approximate 95% confidence interval, indicate significantly autocorrelation at that lag. There are significant spikes at lag 12 and lag 24 which suggests that there are patterns that are left unexplained by the naive model. The spikes at some lags indicate seasonality or external shocks which might be better captured by other models.
accuracy(naive_model)
## ME RMSE MAE MPE MAPE MASE
## Training set -683.4326 55803.6 41483.85 -0.6755225 6.063339 0.7298063
## ACF1
## Training set -0.1325041
naive_forecast <- naive(Transit_ts,12)
naive_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2023 532206 460690.8 603721.2 422832.9 641579.1
## Sep 2023 532206 431068.2 633343.8 377529.1 686882.9
## Oct 2023 532206 408338.0 656074.0 342766.3 721645.7
## Nov 2023 532206 389175.6 675236.4 313459.9 750952.1
## Dec 2023 532206 372293.2 692118.8 287640.4 776771.6
## Jan 2024 532206 357030.3 707381.7 264297.8 800114.2
## Feb 2024 532206 342994.6 721417.4 242832.1 821579.9
## Mar 2024 532206 329930.5 734481.5 222852.3 841559.7
## Apr 2024 532206 317660.4 746751.6 204086.8 860325.2
## May 2024 532206 306055.1 758356.9 186338.0 878074.0
## Jun 2024 532206 295016.9 769395.1 169456.6 894955.4
## Jul 2024 532206 284470.1 779941.9 153326.6 911085.4
plot(naive_forecast)

# The MAPE for this type of forecast is 6.063339 which is less than 10%, so this is a good indicator that the naive forecast is overall performing well especially in the pre-COVID-19 time, and the data does not have extremely small values that impact the percentages except for the years of 2020-2021.
# In 1 year, the time series value for the usage of public transportation services will be 532206.
# The forecast is taking the last value and predicting the values for the next whole year for the point level forecast.
# Simple Moving Averages
plot(Transit_ts,
main = paste("Time Series Plot for the Usage of the Public Transportation Services"),
xlab = "Year",
ylab = "Public Transportation Services Usage",
col = "black")
MA3_forecast <- ma(Transit_ts,order=3)
lines(MA3_forecast, col = "red")
MA6_forecast <- ma(Transit_ts,order=6)
lines(MA6_forecast, col = "blue")
MA9_forecast <- ma(Transit_ts,order=9)
lines(MA9_forecast, col = "green")

# It is observed that the moving average of the order 9 is the best fit that has the best smoothing. So in forecasting the next 12 months, it will be the best order fit to be used.
# Moving Average with the Order = 9
MA9_model <- ma(Transit_ts, order = 9, centre = FALSE)
MA9_f <- forecast(MA9_model, h = 12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
MA9_f
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Apr 2023 548293.2 537051.7 559534.6 531100.9 565485.4
## May 2023 544672.6 521526.8 567818.3 509274.2 580070.9
## Jun 2023 541776.1 505888.4 577663.8 486890.5 596661.6
## Jul 2023 539458.9 490582.2 588335.6 464708.4 614209.4
## Aug 2023 537605.2 475825.9 599384.4 443122.0 632088.4
## Sep 2023 536122.2 461715.7 610528.7 422327.2 649917.1
## Oct 2023 534935.8 448281.0 621590.5 402408.7 667462.8
## Nov 2023 533986.6 435514.7 632458.6 383386.8 684586.5
## Dec 2023 533227.3 423389.4 643065.2 365244.8 701209.9
## Jan 2024 532619.9 411867.7 653372.1 347945.3 717294.5
## Feb 2024 532133.9 400907.4 663360.4 331440.3 732827.5
## Mar 2024 531745.2 390465.9 673024.4 315677.2 747813.1
plot(MA9_f)

accuracy(MA9_f)
## ME RMSE MAE MPE MAPE MASE
## Training set -195.8352 8691.624 6867.204 0.00766116 0.9437753 0.1484077
## ACF1
## Training set 0.06643575
# Based on the moving average of order 9 forecast, in 1 year the public transportation usage will be 531745.
MA3_model <- ma(Transit_ts, order = 3, centre = FALSE)
MA3_f <- forecast(MA3_model, h = 12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
MA3_f
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jul 2023 567792.1 539434.8 596149.3 524423.4 611160.8
## Aug 2023 579627.0 537198.3 622055.6 514737.9 644516.0
## Sep 2023 610628.4 556179.1 665077.7 527355.4 693901.4
## Oct 2023 610519.2 545147.1 675891.3 510541.2 710497.2
## Nov 2023 593501.9 517978.7 669025.2 477999.1 709004.8
## Dec 2023 561566.1 476507.0 646625.3 431479.4 691652.9
## Jan 2024 547527.0 453454.2 641599.8 403655.1 691398.9
## Feb 2024 565911.6 463283.3 668539.8 408955.2 722867.9
## Mar 2024 576935.3 466160.5 687710.1 407519.9 746350.8
## Apr 2024 598185.3 479632.9 716737.8 416875.0 779495.7
## May 2024 584492.6 458497.1 710488.1 391799.1 777186.1
## Jun 2024 577737.2 444603.7 710870.7 374127.1 781347.3
plot(MA3_f)

accuracy(MA3_f)
## ME RMSE MAE MPE MAPE MASE
## Training set -474.3732 21447.51 9418.979 -0.06588614 1.651357 0.1843854
## ACF1
## Training set 0.5855302
MA6_model <- ma(Transit_ts, order = 6, centre = FALSE)
MA6_f <- forecast(MA6_model, h = 12)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
MA6_f
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## May 2023 566436.9 551704.6 581169.3 543905.7 588968.1
## Jun 2023 568133.7 541205.3 595062.0 526950.3 609317.1
## Jul 2023 569491.1 528763.0 610219.2 507202.9 631779.3
## Aug 2023 570577.1 515573.9 625580.2 486457.0 654697.1
## Sep 2023 571445.8 502182.1 640709.5 465516.1 677375.5
## Oct 2023 572140.8 488882.8 655398.8 444808.7 699473.0
## Nov 2023 572696.8 475844.2 669549.5 424573.5 720820.2
## Dec 2023 573141.6 463160.8 683122.4 404940.5 741342.8
## Jan 2024 573497.5 450882.2 696112.7 385973.7 761021.3
## Feb 2024 573782.1 439029.5 708534.7 367695.8 779868.4
## Mar 2024 574009.9 427605.8 720413.9 350104.2 797915.5
## Apr 2024 574192.1 416603.0 731781.1 333180.4 815203.8
plot(MA6_f)

accuracy(MA6_f)
## ME RMSE MAE MPE MAPE MASE
## Training set -186.0535 11391.86 8072.671 0.03998999 1.175111 0.1671021
## ACF1
## Training set 0.02346138
# As the moving average order goes up, the smoothing is stronger. At the 9th order, it produces a smoother trend and less random variation. It has has the best fit and lowest errors. This is also proved by the MAPE as it moves from 1.65 -> 1.18 -> 0.94 respectively. A MAPE below 0 means that it is statistically an extremely close prediction with extremely accurate relative error. This order will be great for long-term forecasting but not for rapid shifts.
# Simple Smoothing
ses_model <- ses(Transit_ts,12)
plot(ses_model)

summary(ses_model)
##
## Forecast method: Simple exponential smoothing
##
## Model Information:
## Simple exponential smoothing
##
## Call:
## ses(y = Transit_ts, h = 12)
##
## Smoothing parameters:
## alpha = 0.8374
##
## Initial states:
## l = 732001.6666
##
## sigma: 55323.68
##
## AIC AICc BIC
## 7782.916 7783.002 7793.852
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -812.0035 55127.84 40313.36 -0.8363806 6.026904 0.7092143
## ACF1
## Training set 0.01607982
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2023 539578.3 468678.2 610478.5 431145.9 648010.7
## Sep 2023 539578.3 447103.9 632052.8 398150.9 681005.8
## Oct 2023 539578.3 429686.5 649470.1 371513.3 707643.4
## Nov 2023 539578.3 414674.8 664481.9 348554.8 730601.9
## Dec 2023 539578.3 401283.0 677873.6 328073.9 751082.8
## Jan 2024 539578.3 389078.2 690078.4 309408.3 769748.4
## Feb 2024 539578.3 377791.5 701365.1 292146.7 787009.9
## Mar 2024 539578.3 367242.4 711914.2 276013.3 803143.4
## Apr 2024 539578.3 357302.8 721853.8 260812.0 818344.6
## May 2024 539578.3 347877.9 731278.7 246397.9 832758.8
## Jun 2024 539578.3 338895.2 740261.5 232659.9 846496.8
## Jul 2024 539578.3 330297.6 748859.1 219511.1 859645.6
# The value of alpha is 0.8374. Since the alpha is closer to 1 indicating that the model reacts quickly to recent changes, therefore the most weight is on the latest observed values which dominate the forecast.
# The initial state value is 732001.6666.
# The value of sigma is 55323.68. Sigma gives the typical deviation of actual values from fitted values. Based on the initial state, the forecast deviates from the actual values by 7.5% which is considered to be a good forecast. The deviation in this case is considered moderate and therefore it can be used for practical predictions centered around the usage fluctuations of the public transportation services.
checkresiduals(ses_model)

##
## Ljung-Box test
##
## data: Residuals from Simple exponential smoothing
## Q* = 249.65, df = 24, p-value < 2.2e-16
##
## Model df: 0. Total lags used: 24
ses_residuals <- residuals(ses_model)
ses_residuals
## Jan Feb Mar Apr May
## 2000 -7067.66664 30452.53069 92146.73071 -71285.46416 49194.29336
## 2001 34614.16470 -33450.43055 80048.69764 -29939.06089 42066.77886
## 2002 26455.00224 -33605.41828 58583.49079 14133.88642 15815.70332
## 2003 16801.87805 -43811.38056 80896.61617 3702.84266 -10116.77805
## 2004 -15183.12841 6966.65034 94144.04025 -24627.62625 -26209.38142
## 2005 -17134.47941 -8280.71297 90104.24357 -11344.64964 -12121.06816
## 2006 20864.68426 -27764.61495 107973.42583 -49964.44626 55374.89548
## 2007 25230.53170 -37530.56340 132079.11406 -29172.95166 104594.37713
## 2008 31730.21001 -11068.47042 52238.84901 26851.00824 8639.98724
## 2009 -12348.05617 -23081.25992 87819.11610 -9538.29761 -18632.28715
## 2010 -30940.70134 -47748.12568 140019.35285 -1140.56918 -17114.49959
## 2011 -24465.23459 -17266.97041 124644.73899 -30696.06160 18767.66198
## 2012 -1102.31972 16384.72121 56627.77400 -34693.18693 24794.57855
## 2013 31905.99626 -45931.88094 67886.74278 30039.94628 5712.62891
## 2014 -12337.27722 -21764.50686 97417.26984 23495.72441 6838.29147
## 2015 -46182.73894 -45545.05620 120255.65508 13740.10772 -27866.33996
## 2016 -56823.32412 27236.38310 79310.66374 -23402.09305 4987.93652
## 2017 -11348.56932 -26717.70564 104063.69279 -34628.31640 35440.12895
## 2018 3677.01227 -22042.97904 71508.97975 -379.94129 28787.20721
## 2019 -19.88805 -32937.23455 86388.16268 34607.97537 14045.56283
## 2020 17623.99339 -23549.67358 -286769.06564 -380849.47215 -32804.49666
## 2021 -17972.10049 -19952.94177 69313.89868 15314.05569 24168.64338
## 2022 -61534.08818 24765.23509 89308.76182 8763.97500 11825.35307
## 2023 -44.74156 -12937.27667 78516.91059 -20463.18727 45724.91351
## Jun Jul Aug Sep Oct
## 2000 -16334.15141 -52821.54944 34727.22268 21099.95694 52170.64927
## 2001 -36537.35424 -28229.35265 20335.84194 -38120.62467 96757.14783
## 2002 -63661.76979 455.19377 -1165.96832 23217.36955 75869.02047
## 2003 -47331.36990 5415.13307 -20774.29501 57990.31565 33077.41376
## 2004 -837.63451 -30264.23098 13610.89393 43696.64500 38047.72350
## 2005 -13019.34311 -43279.43651 47053.13035 58430.61468 9356.02300
## 2006 -33686.95226 -42957.76971 43034.44551 17033.02487 59526.21264
## 2007 -99662.00710 -32776.80340 42534.25484 1884.67499 44032.51928
## 2008 -27081.81202 22165.47535 -16476.05723 35844.37132 48412.64747
## 2009 -11110.31303 -7455.95618 -19012.61985 46989.83052 50080.31974
## 2010 -13166.46350 -25663.36384 21653.16856 23319.62235 27606.65063
## 2011 -16098.66991 -39264.25126 12142.15099 50347.77197 35940.44775
## 2012 -44629.46583 -10039.43537 43098.20894 -9095.60479 45976.71141
## 2013 -55922.91096 7759.82426 21231.04026 15892.96836 74644.79642
## 2014 -48137.83574 3705.97128 1900.73079 44182.13056 62561.68177
## 2015 2406.88150 12069.44976 -47444.05199 48839.80670 63591.19569
## 2016 -11515.77329 -60308.89932 45587.49704 27044.24740 9260.41522
## 2017 -27248.09759 -56354.56896 37948.62038 13110.87778 68114.32350
## 2018 -30999.11374 -30244.62574 31070.08248 -11726.83207 103146.77456
## 2019 -58909.66143 12172.06592 15953.63727 12880.66348 71265.88197
## 2020 50274.75087 58681.57176 21451.83809 15899.87842 22903.92026
## 2021 39818.73349 19886.02874 8502.21873 36733.78162 42774.29915
## 2022 7345.24867 -25876.38537 33263.52322 35977.90417 25147.36495
## 2023 -28909.40350 -45329.75999
## Nov Dec
## 2000 -26650.08361 -65414.30932
## 2001 -35616.63634 -69688.60917
## 2002 -76023.83413 -46853.34441
## 2003 -100420.36425 17565.85911
## 2004 -41794.00432 -36544.28232
## 2005 -32010.35758 -69340.09215
## 2006 -45838.79143 -54990.11735
## 2007 -68880.65067 -73462.59321
## 2008 -124777.27653 -23420.49400
## 2009 -79269.05008 -28251.13900
## 2010 -46253.11679 -32601.50229
## 2011 -50843.72688 -31778.10873
## 2012 -54449.45166 -47221.53565
## 2013 -97885.93916 -41006.94772
## 2014 -134687.10959 505.79316
## 2015 -94371.67178 -13049.39523
## 2016 -47603.90702 -40117.19176
## 2017 -68001.04050 -69067.53541
## 2018 -92144.44191 -64154.16360
## 2019 -89788.47816 -37352.99498
## 2020 -36454.95822 -7501.95194
## 2021 -24574.28474 -22676.70608
## 2022 -30023.08945 -33649.88735
## 2023
plot(ses_residuals,
main = "Residuals from Simple Exponential Smoothing Forecast for the Usage of Public Transportation Services",
ylab = "Residuals", xlab = "Time",
col = "blue", type = "l")
abline(h = 0, col = "red")

# The plot indicates that the model captured the main pattern of the public transportation usage, because the pattern oscillates around 800,000 transit users except for the sharp drop around 2020 which is a huge negative deviation from the norm.
hist(ses_residuals,
main = "Histogram of Residuals",
xlab = "Residuals",
col = "seagreen")

# The histogram roughly forms a bell-shaped curve centered around 0, however due to the sudden drop around 2020 and the gradual recovery is still below the pre-2020 levels explaining the lower outliers in the boxplot earlier, resulting in the negatively skewed AKA left skewed distribution of the histogram.
plot(fitted(ses_model), residuals(ses_model),
main = "Residuals vs Fitted Values",
xlab = "Fitted Values",
ylab = "Residuals",
col = "steelblue", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# The cluster at the higher values suggests that the data points around and above 800,000 is where the model consistently predicts well and captures a stable pattern which indicates smaller variation. The random scatter at lower fitted values could indicate greater variation in those years due to the external sudden shock like the COVID-19 pandemic in 2020 and the post-pandemic recovery years. At the lower values, we observe heteroscedasticity or non-constant variance because the scatterplot is wider and more random at those values.
plot(Transit_ts, residuals(ses_model),
main = "Residuals vs Actual Values",
xlab = "Actual Values",
ylab = "Residuals",
col = "navy", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# This plot is similar to the fitted vs. residuals plot meaning that the model fits the data well. It indicates higher variation at lower values and smaller variation at higher values during the stable years. The clustering around the higher values indicates there is a model misfit in this case due to heteroscedasticity.
Acf(ses_residuals)

# The bars outside of the blue dashed line which is the approximate 95% confidence interval, indicate significantly autocorrelation at that lag. There are significant spikes at lag 12 and lag 24 which suggests that there are patterns that are left unexplained by the naive model. The spikes at some lags indicate seasonality or external shocks which might be better captured by other models.
accuracy(ses_model)
## ME RMSE MAE MPE MAPE MASE
## Training set -812.0035 55127.84 40313.36 -0.8363806 6.026904 0.7092143
## ACF1
## Training set 0.01607982
ses_forecast <- ses(Transit_ts,12)
ses_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2023 539578.3 468678.2 610478.5 431145.9 648010.7
## Sep 2023 539578.3 447103.9 632052.8 398150.9 681005.8
## Oct 2023 539578.3 429686.5 649470.1 371513.3 707643.4
## Nov 2023 539578.3 414674.8 664481.9 348554.8 730601.9
## Dec 2023 539578.3 401283.0 677873.6 328073.9 751082.8
## Jan 2024 539578.3 389078.2 690078.4 309408.3 769748.4
## Feb 2024 539578.3 377791.5 701365.1 292146.7 787009.9
## Mar 2024 539578.3 367242.4 711914.2 276013.3 803143.4
## Apr 2024 539578.3 357302.8 721853.8 260812.0 818344.6
## May 2024 539578.3 347877.9 731278.7 246397.9 832758.8
## Jun 2024 539578.3 338895.2 740261.5 232659.9 846496.8
## Jul 2024 539578.3 330297.6 748859.1 219511.1 859645.6
plot(ses_forecast)

# The MAPE for this type of forecast is 6.026904 which is less than 10%, so this is a good indicator that the simple smoothing forecast method is overall performing well especially in the pre-COVID-19 time, and the data does not have extremely small values that impact the percentages except for the years of 2020-2021.
# In 1 year, the time series value for the usage of public transportation services will be 539578.
# The forecast is taking the last value and predicting values close to that for the upcoming whole year.
# Holt-Winters
HW_model <- HoltWinters(Transit_ts)
HW_model
## Holt-Winters exponential smoothing with trend and additive seasonal component.
##
## Call:
## HoltWinters(x = Transit_ts)
##
## Smoothing parameters:
## alpha: 0.9284877
## beta : 0
## gamma: 1
##
## Coefficients:
## [,1]
## a 551045.439
## b 1531.450
## s1 -9283.195
## s2 7191.342
## s3 51986.625
## s4 -14146.686
## s5 -39238.939
## s6 -36522.821
## s7 -43226.301
## s8 23642.699
## s9 -2666.430
## s10 25685.841
## s11 4968.181
## s12 -18839.439
plot(HW_model)

# Alpha = 0.9284877, therefore the level updates very quickly and the dataset is highly sensitive to recent changes in values. With the consideration of the post-2020 recovery efforts, an alpha of this scale is helpful to detect trends and turning points faster while giving less weight to the pre-COVID era in public transportation usage.
# Beta = 0; The beta component being equal to 0 means that there is no meaningful trend. The trend does not change over time and stays constant except for the noticeable trend drop in 2020.
# Gamma = 1; the model completely relies on the most recent seasonal observation and it updates fully to match the last season's deviation. In this case, it means that the past seasonal values like the pre-2020 seasonal values hold no influence on the incoming data.
# a = 551045.439 is the initial level of the series; it is the baseline around which the trend and seasonality of the series oscillates.
# b = 1531.450 is the initial trend. This is a positive trend which states that per period there is an increase of 1531.450 units on the top of the a value above.
# s1 - s12 values are the seasonal indices for months 1 to 12. These represent the additive seasonal adjustments relative to the baseline. For certain months, the value is positive indicating an above average value and for others, the value is negative indicating a below average value.
checkresiduals(HW_model)

##
## Ljung-Box test
##
## data: Residuals from HoltWinters
## Q* = 62.11, df = 24, p-value = 3.198e-05
##
## Model df: 0. Total lags used: 24
HW_residuals <- residuals(HW_model)
HW_residuals
## Jan Feb Mar Apr May
## 2001 12443.7337 2031.1375 -1849.7824 -2238.2324 -682.6780
## 2002 -7837.7936 2497.3880 -22978.9064 43842.5551 -30917.5780
## 2003 -18989.5444 -7675.1911 2088.4968 26796.6280 -51026.3047
## 2004 -53984.8072 44993.1036 10145.7022 -4879.1173 -59211.2151
## 2005 -42161.5620 23830.4765 9564.3313 10093.7664 -42327.0593
## 2006 2811.7704 -3336.6186 29679.7552 -30031.5953 32329.2171
## 2007 4492.0584 -13252.7723 52303.5532 -7272.6228 75335.2001
## 2008 12318.8458 13980.9119 -34580.9855 59783.4945 -30842.9375
## 2009 -38251.4800 5401.6556 5812.2635 13747.8898 -49008.1467
## 2010 -52030.5608 -20348.4520 60153.4517 16974.4564 -44137.5817
## 2011 -40608.5445 7630.7030 36065.4312 -8715.3049 -915.4546
## 2012 -13905.1547 35942.6252 -37433.3509 -3703.9972 5562.1304
## 2013 22029.7338 -32737.3171 -15703.5269 58339.8893 -20273.6128
## 2014 -24667.2299 -797.2070 10962.4577 43604.8221 -13515.5216
## 2015 -60413.9964 -23336.5383 35215.6590 29534.8941 -43554.8891
## 2016 -63295.0627 48317.9373 -16629.3675 -4249.4166 -1849.0659
## 2017 -10134.1860 -17212.1061 16856.9596 -17992.0519 29274.5127
## 2018 7277.1760 -13229.8222 -18609.6142 21506.7848 16496.0249
## 2019 -124.4162 -22585.6250 -2242.9477 52376.3878 -1370.0893
## 2020 12714.2994 -13543.2476 -377735.0577 -333149.8536 -4377.4099
## 2021 -27373.9872 -5146.3151 4407.6052 29240.6430 -9431.3070
## 2022 -66346.2381 42280.1402 19837.9557 18766.2356 -18692.8370
## 2023 1271.3492 -8355.4911 13161.7786 -9106.5517 20645.6999
## Jun Jul Aug Sep Oct
## 2001 -1364.5202 23607.0531 -14899.4233 -59158.2334 50508.4274
## 2002 -26332.9261 53131.7323 -36201.3616 7318.4644 16555.8156
## 2003 -7550.8054 51096.1869 -50373.5167 41154.7804 -30132.0569
## 2004 37865.8420 6909.0551 -6089.1105 17890.2630 -19402.8723
## 2005 17658.0664 -3356.2588 29665.3219 28027.3241 -47099.0318
## 2006 -12937.7665 -358.4541 23447.2538 -13044.2071 12239.3057
## 2007 -80605.1672 14813.9221 20673.8648 -25699.9516 -3503.8163
## 2008 11139.4423 57689.1518 -44625.0884 16503.6691 -3216.0358
## 2009 27245.1523 23293.0326 -37487.1101 24200.6665 -1400.4055
## 2010 19837.4117 5172.9133 9190.6045 -7156.7369 -20436.0538
## 2011 9585.3274 -7635.5450 688.3286 21979.2814 -13464.8373
## 2012 -20213.3806 25329.9856 28653.5091 -42021.4295 4197.0163
## 2013 -28407.6879 41123.0831 4783.2366 -10129.3220 27776.5075
## 2014 -19740.7154 31450.5754 -11980.7218 21171.2802 10620.1527
## 2015 34745.2792 31829.1871 -59552.7762 28081.5205 12140.8546
## 2016 12862.3349 -39641.1754 46674.1473 -7514.8787 -40050.7713
## 2017 -6648.9932 -30768.8299 32854.0251 -17319.0072 22694.8285
## 2018 -7662.8407 -2396.4684 19208.1826 -38425.8653 57395.9496
## 2019 -32725.8318 42390.6103 -1148.9896 -8694.1117 16786.7000
## 2020 86105.4279 74268.7557 2178.3038 -5791.6007 -33680.9253
## 2021 59551.3801 36121.2829 -2034.3571 17417.0883 -13544.7909
## 2022 23489.9594 -5263.0116 29938.7545 13529.5409 -29112.6443
## 2023 -18481.4400 -19765.2913
## Nov Dec
## 2001 -10857.4785 -3214.5598
## 2002 -45907.0919 23139.4007
## 2003 -62215.9835 85422.4444
## 2004 -1336.3212 15573.1290
## 2005 9841.0839 -19223.7825
## 2006 -11975.4138 -2106.4353
## 2007 -31891.5907 -18961.4390
## 2008 -86449.9370 35345.3060
## 2009 -35130.8528 18073.3966
## 2010 2591.0055 7246.2158
## 2011 -4503.1841 7975.9897
## 2012 -9119.0131 -8103.5126
## 2013 -54579.5311 1851.8823
## 2014 -84752.9544 43156.5748
## 2015 -37675.8592 17264.0648
## 2016 17758.3055 -17374.5910
## 2017 -11857.6726 -42613.0637
## 2018 -36746.1896 -33353.5102
## 2019 -25376.9317 -6365.0871
## 2020 35228.2310 17786.3451
## 2021 40389.3620 2295.7379
## 2022 32837.1164 -5607.1791
## 2023
sigma <- sd(HW_residuals, na.rm = TRUE)
sigma
## [1] 43052.6
# Sigma = 43052.6; Based on the alpha value, the model deviates by about 43,052 units, which means that the relative error is 7.8%. Since the forecasting model relative error is under 10%, the model is reasonably accurate and most forecasts are within +/- 7.8% of the actual values. When it comes down to budget planning for the public transportation services, this level of standard deviation is acceptable and the city officials does not need a funding or service improvements based on this sigma.
plot(HW_residuals,
main = "Residuals from Holt-Winters Forecast for the Usage of the Public Transportation Services",
ylab = "Residuals", xlab = "Time",
col = "blue", type = "l")
abline(h = 0, col = "red")

# The plot indicates that the model captured the main pattern of the public transportation usage, because the pattern oscillates around 800,000 transit users except for the sharp drop around 2020 which is a huge negative deviation from the norm.
hist(HW_residuals,
main = "Histogram of Residuals",
xlab = "Residuals",
col = "lightblue")

# The histogram roughly forms a bell-shaped curve centered around 0, however due to the sudden drop around 2020 and the gradual recovery is still below the pre-2020 levels explaining the lower outliers in the boxplot earlier, resulting in the negatively skewed AKA left skewed distribution of the histogram.
plot(fitted(HW_model), residuals(HW_model),
main = "Residuals vs Fitted Values",
xlab = "Fitted Values",
ylab = "Residuals",
col = "steelblue", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# The xhat plot follows the shape of the actual data very closely, therefore it can be interpreted that the model captures the overall pattern well. The level graph follows the original time series pattern exactly including the 2020 sharp decline. The trend line is flat meaning the model does not have a strong upward or downward growth, therefore it stays constant. The seasonality oscillates between -40K - 40K at cyclical regular intervals suggesting a stable and strong seasonal effect over time.
plot(Transit_ts, residuals(HW_model),
main = "Residuals vs Actual Values",
xlab = "Actual Values",
ylab = "Residuals",
col = "navy", pch = 19)
abline(h = 0, col = "red", lwd = 2)

# This plot is similar to the fitted vs. residuals plot meaning that the model fits the data well. It indicates higher variation at lower values and smaller variation at higher values during the stable years. The clustering around the higher values indicates there is a model misfit in this case due to heteroscedasticity.
Acf(HW_residuals)

# ACF plot has most of the spikes between the blue dashed line which indicates there are no significant patterns left unexplained by the model. The model mostly captures the white noise, indicating that it is a good fit for the dataset.
HW_forecast <- forecast(HW_model,12)
HW_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 2023 543293.7 488119.6 598467.8 458912.1 627675.2
## Sep 2023 561299.7 486009.9 636589.4 446154.0 676445.4
## Oct 2023 607626.4 516561.0 698691.8 468353.9 746898.9
## Nov 2023 543024.6 438538.8 647510.3 383227.4 702821.7
## Dec 2023 519463.8 403095.3 635832.2 341493.5 697434.0
## Jan 2024 523711.3 396565.8 650856.8 329259.0 718163.6
## Feb 2024 518539.3 381461.4 655617.2 308896.8 728181.8
## Mar 2024 586939.7 440602.1 733277.4 363135.6 810743.8
## Apr 2024 562162.1 407116.7 717207.4 325040.6 799283.5
## May 2024 592045.8 428756.4 755335.2 342316.2 841775.3
## Jun 2024 572859.6 401722.8 743996.3 311128.5 834590.6
## Jul 2024 550583.4 371943.7 729223.1 277377.6 823789.2
accuracy(HW_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set -2603.178 43051.87 26226.83 -0.8402477 4.26629 0.4613965
## ACF1
## Training set 0.06361874
plot(HW_forecast)

# The MAPE for this type of forecast is 4.26629 which is less than 10% so this is a good indicator that the Holt-Winters forecast method is overall performing well especially in the pre-COVID-19 time, and the data does not have extremely small values that impact the percentages except for the years of 2020-2021. Considering the trend and seasonal component, this model can be used to model our dataset.
# In 1 year, the time series value for the usage of the public transportation services will be 550583.4.
# The forecast is mimicking the original plot, so it is adding some noise back into the forecast, therefore the forecast takes the trend of the overall plot of the actual dataset as it looks to forecast the upcoming year.
# Accuracy Summary
Naive <- accuracy(naive_model)
MA3 <- accuracy(MA3_f)
MA6 <- accuracy(MA6_f)
MA9 <- accuracy(MA9_f)
SES <- accuracy(ses_model)
HoltWinters <- accuracy(HW_forecast)
models <- list(
Naive = Naive,
MA3 = MA3,
MA6 = MA6,
MA9 = MA9,
SES = SES,
HoltWinters = HoltWinters
)
accuracy_table <- do.call(rbind, lapply(names(models), function(name) {
acc <- models[[name]][1, c("RMSE", "MAE", "MAPE")]
df <- data.frame(
Model = name,
RMSE = acc["RMSE"],
MAE = acc["MAE"],
MAPE = acc["MAPE"],
stringsAsFactors = FALSE
)
rownames(df) <- NULL
return(df)
}))
rownames(accuracy_table) <- NULL
accuracy_table
# I choose MAPE as my accuracy measure because it is scale-independent due to this reason we can focus on relative errors based on the initial levels for the data set making it the best measure to be considered. Based on the summary table, the Moving Average Method of the order 9 has the lowest MAPE making it the best forecast for the short-term forecast of the data otherwise Holt-Winters would be the best in order to forecast long-term trend and seasonality and the Naive Method has the largest MAPE making it the worst forecast for this data.
# Conclusion
# The time series has a steady trend and seasonality except for the sharp drop in 2020-2021 due to the COVID-19 pandemic and a reduction in mobility. For short-term future planning, the Moving Average forecast of the order 9 should be used in order to conduct resource allocation which can prevent wasted resources in low-demand/high-demand months appropriately and ensures capacity planning for the upcoming year. For long-term strategic planning, the Holt-Winters model should be used to understand the potential growth/decline over months/years taking trend and seasonality in consideration. This can be used to make any type of infrastructure improvements projects or route expansions.
# As the public transportation service is recovering from the post-pandemic effect, over the next year I anticipate for the value of the time series to go up. In 2 years, I also anticipate for the value to continue going up as more companies are implementing back to office mandates and school/colleges are back to normal schedules.